R Data and Information

This is information on the R we are using.

sessionInfo()
## R version 3.3.1 Patched (2016-10-19 r71540)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X Yosemite 10.10.5
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] magrittr_1.5    tools_3.3.1     htmltools_0.3.5 yaml_2.1.13    
##  [5] Rcpp_0.12.6     stringi_1.1.1   rmarkdown_1.0   knitr_1.14     
##  [9] stringr_1.1.0   digest_0.6.10   evaluate_0.9

For our project we examined four different data sets found on the data.gov website. The data sets we examined were of Austin Salaries, Parish Salaries, and Austin Crime data from 2014 and 2016. In order to have the data ready to be imported into our SQL database, we scrubbed the data of all non-readible characters that SQL would reject.

URLs for data sets: Austin Employee data: https://catalog.data.gov/dataset/city-of-austin-employee-detail-information Baton Rouge Employee data: https://catalog.data.gov/dataset/city-parish-employees Austin Crime 2014 data: https://catalog.data.gov/dataset/annual-crime-2014 Austin Crime 2016 data: http://us-city.census.okfn.org/dataset/crime-stats

ETL File

Process
1. In the ETL file, we read in the csv file into R Studio and retrived the names of the columns of the data set.
2. We removed or replaced non-readable characters for each of the names.
3. Replaced “.” with “_“.
4. Replaced”-" with " “.
5. Replaced”&" with “AND”.
6. Replaced “/” with “OR”.
7. Reformatted the csv file with the revised name of the column.
8. Printed out SQL code to make the table for our data set in order to upload it to Oracle.

Below are the four different ETL files we created:

Austin, Texas Salaries:

#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/austin_salaries_ETL.R", echo = TRUE)
source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/austin_salaries_ETL.R", echo = TRUE)
## 
## > require(tidyr)
## Loading required package: tidyr
## 
## > require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## > require(ggplot2)
## Loading required package: ggplot2
## 
## > setwd("~/dv_finalproject_rabinbhattarai/01 Data")
## 
## > file_path <- "austin_salaries.csv"
## 
## > salaries <- read.csv(file_path, stringsAsFactors = FALSE)
## 
## > df <- rename(salaries, first_name = First, last_name = Last)
## 
## > for (n in names(df)) {
## +     df[n] <- data.frame(lapply(df[n], gsub, pattern = "[^ -~]", 
## +         replacement = ""))
## + }
## 
## > str(df)
## 'data.frame':    15521 obs. of  29 variables:
##  $ Emp.ID                  : Factor w/ 15521 levels "10000003","10000008",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ last_name               : Factor w/ 7088 levels "Aalbers","Aarniokoski",..: 6477 4347 2504 6903 6817 940 4036 3884 6947 5390 ...
##  $ first_name              : Factor w/ 3379 levels "A J","Aakash",..: 858 151 1451 2909 1255 1596 1451 136 1236 2190 ...
##  $ MI                      : Factor w/ 2942 levels "","A","A M","A.",..: 645 1425 265 167 2805 1876 70 857 871 1767 ...
##  $ Department              : Factor w/ 37 levels "11E","15E","16E",..: 34 32 30 34 32 32 32 32 36 32 ...
##  $ Department.Name         : Factor w/ 36 levels "Animal Services",..: 20 32 23 20 32 32 32 32 15 32 ...
##  $ Division.No             : Factor w/ 504 levels "1.10E+01","1.10E+101",..: 469 440 353 461 440 440 440 440 501 440 ...
##  $ Division.Title          : Factor w/ 484 levels "AAQL","Accounting",..: 477 400 278 240 400 400 400 400 303 400 ...
##  $ Effective.Date          : Factor w/ 651 levels "1/1/2012","1/1/2015",..: 626 173 480 608 413 48 440 626 574 52 ...
##  $ Length.of.Service.w.City: Factor w/ 42 levels "0","1","10","11",..: 8 11 16 20 10 15 9 8 11 6 ...
##  $ Length.of.Service.in.Job: Factor w/ 26 levels "0","1","10","11",..: 9 13 10 4 20 4 13 2 24 23 ...
##  $ Staffing.Level          : Factor w/ 5 levels "1","2","3","4",..: 2 3 2 1 2 1 3 1 2 1 ...
##  $ Staffing.Level.Desc     : Factor w/ 6 levels "","Council Appointed",..: 5 4 5 6 5 6 4 6 5 6 ...
##  $ Posn                    : Factor w/ 12106 levels "100001","100002",..: 4540 2359 6528 3454 2468 5123 2377 2495 4390 8195 ...
##  $ Title                   : Factor w/ 1168 levels "727","911 Call Taker",..: 1095 822 571 18 829 823 826 823 417 828 ...
##  $ Job.Status              : Factor w/ 3 levels "A","B","L": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Employee.Status         : Factor w/ 4 levels "A","B","L","T": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ECLS.Code               : Factor w/ 21 levels "1","12","13",..: 9 8 9 1 15 15 8 15 13 15 ...
##  $ ECLS.Desc               : Factor w/ 21 levels "City Councilmembers",..: 17 13 17 18 14 14 13 14 4 14 ...
##  $ Job.FTE                 : Factor w/ 28 levels "0.05","0.125",..: 23 23 23 23 23 23 23 23 23 23 ...
##  $ Job.Hrs.Pay             : Factor w/ 24 levels "10","106","12",..: 21 21 21 21 21 21 21 21 22 21 ...
##  $ Hourly.Rate             : Factor w/ 2566 levels "$0.00","$10.00",..: 1337 2501 2141 735 2311 2215 2441 2104 2007 1715 ...
##  $ Annual.Salary           : Factor w/ 2879 levels "$0.00","$1,560.00",..: 1936 355 2817 1310 124 24 274 2780 2776 2343 ...
##  $ EEO.Code                : Factor w/ 8 levels "10","20","30",..: 2 2 2 6 4 4 4 4 2 4 ...
##  $ EEO.Desc                : Factor w/ 8 levels "Admin/Supp","Official/Adm",..: 4 4 4 1 5 5 5 5 4 5 ...
##  $ Age                     : Factor w/ 75 levels "0","14","15",..: 38 34 38 45 28 33 38 29 43 27 ...
##  $ Ethnicity               : Factor w/ 7 levels "","American Indian/Aleutian",..: 7 7 7 5 7 7 7 7 7 7 ...
##  $ Ethnicity.Code          : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 3 1 1 1 1 1 1 ...
##  $ Gender                  : Factor w/ 2 levels "F","M": 1 2 2 1 2 1 2 1 2 2 ...
## 
## > str(salaries)
## 'data.frame':    15521 obs. of  29 variables:
##  $ Emp.ID                  : int  10000003 10000008 10000015 10000022 10000026 10000039 10000066 10000084 10000127 10000132 ...
##  $ Last                    : chr  "Turner" "Michael" "Greco" "Wiswell-DeCampo" ...
##  $ First                   : chr  "Diane" "Andrew" "James" "Sherryl" ...
##  $ MI                      : chr  "Dallas" "Joseph" "B" "Ann" ...
##  $ Department              : chr  "91G" "87G" "85G" "91G" ...
##  $ Department.Name         : chr  "Health & Human Services" "Police" "Library" "Health & Human Services" ...
##  $ Division.No             : chr  "91G427" "87G010" "85G100" "91G100" ...
##  $ Division.Title          : chr  "Womens,Infant,Children (WIC)" "Police Law Enforcement" "Lib Information Systems" "HHSD Office of the Director" ...
##  $ Effective.Date          : chr  "9/21/2014" "12/20/2014" "6/28/2015" "8/9/2015" ...
##  $ Length.of.Service.w.City: int  15 18 22 26 17 21 16 15 18 13 ...
##  $ Length.of.Service.in.Job: int  16 2 17 11 3 11 2 1 7 6 ...
##  $ Staffing.Level          : int  2 3 2 1 2 1 3 1 2 1 ...
##  $ Staffing.Level.Desc     : chr  "Supervisor" "Manager" "Supervisor" "Worker" ...
##  $ Posn                    : chr  "105664" "102924" "108449" "104243" ...
##  $ Title                   : chr  "Supv, WIC" "Police Commander" "IT Supervisor Sr" "Administrative Specialist" ...
##  $ Job.Status              : chr  "A" "A" "A" "A" ...
##  $ Employee.Status         : chr  "A" "A" "A" "A" ...
##  $ ECLS.Code               : int  2 19 2 1 3 3 19 3 23 3 ...
##  $ ECLS.Desc               : chr  "Standard/Exempt" "Police/Exempt" "Standard/Exempt" "Standard/Non-Exempt" ...
##  $ Job.FTE                 : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Job.Hrs.Pay             : int  80 80 80 80 80 80 80 80 84 80 ...
##  $ Hourly.Rate             : chr  "$30.10" "$70.24" "$46.32" "$21.96" ...
##  $ Annual.Salary           : chr  "$62,608.00" "$146,101.28" "$96,345.60" "$45,676.80" ...
##  $ EEO.Code                : int  20 20 20 60 40 40 40 40 20 40 ...
##  $ EEO.Desc                : chr  "Professionals" "Professionals" "Professionals" "Admin/Supp" ...
##  $ Age                     : int  50 46 50 57 40 45 50 41 55 39 ...
##  $ Ethnicity               : chr  "White" "White" "White" "Hispanic" ...
##  $ Ethnicity.Code          : int  1 1 1 3 1 1 1 1 1 1 ...
##  $ Gender                  : chr  "F" "M" "M" "F" ...
## 
## > measures <- c("Length.of.Service.w.City", "Length.of.Service.in.Job", 
## +     "Job.Hrs.Pay", "Age", "Hourly.Rate", "Annual.Salary", "Emp.ID", 
## +      .... [TRUNCATED] 
## 
## > dimensions <- setdiff(names(df), measures)
## 
## > dimensions
##  [1] "last_name"           "first_name"          "MI"                 
##  [4] "Department"          "Department.Name"     "Division.No"        
##  [7] "Division.Title"      "Effective.Date"      "Staffing.Level.Desc"
## [10] "Posn"                "Title"               "Job.Status"         
## [13] "Employee.Status"     "ECLS.Code"           "ECLS.Desc"          
## [16] "Job.FTE"             "EEO.Code"            "EEO.Desc"           
## [19] "Ethnicity"           "Ethnicity.Code"      "Gender"             
## 
## > if (length(measures) > 1 || !is.na(dimensions)) {
## +     for (d in dimensions) {
## +         if (d == "Effective.Date") 
## +             df[d] <- as.Date .... [TRUNCATED] 
## 
## > if (length(measures) > 1 || !is.na(measures)) {
## +     for (m in measures) {
## +         df[m] <- data.frame(lapply(df[m], gsub, pattern = "[^--.0-9]", .... [TRUNCATED] 
## 
## > write.csv(df, paste(gsub(".csv", "", file_path), ".reformatted.csv", 
## +     sep = ""), row.names = FALSE, na = "")
## 
## > tableName <- gsub(" +", "_", gsub("[^A-z, 0-9, ]", 
## +     "", gsub(".csv", "", file_path)))
## 
## > sql <- paste("CREATE TABLE", tableName, "(\n-- Change table_name to the table name you want.\n")
## 
## > if (length(measures) > 1 || !is.na(dimensions)) {
## +     for (d in dimensions) {
## +         if (d == "Effective.Date") 
## +             sql <- paste(sql .... [TRUNCATED] 
## 
## > if (length(measures) > 1 || !is.na(measures)) {
## +     for (m in measures) {
## +         if (m != tail(measures, n = 1)) 
## +             sql <- paste(sq .... [TRUNCATED] 
## 
## > sql <- paste(sql, ");")
## 
## > cat(sql)
## CREATE TABLE austin_salaries (
## -- Change table_name to the table name you want.
##  last_name varchar2(4000),
##  first_name varchar2(4000),
##  MI varchar2(4000),
##  Department varchar2(4000),
##  Department.Name varchar2(4000),
##  Division.No varchar2(4000),
##  Division.Title varchar2(4000),
##  Effective.Date date,
##  Staffing.Level.Desc varchar2(4000),
##  Posn varchar2(4000),
##  Title varchar2(4000),
##  Job.Status varchar2(4000),
##  Employee.Status varchar2(4000),
##  ECLS.Code varchar2(4000),
##  ECLS.Desc varchar2(4000),
##  Job.FTE varchar2(4000),
##  EEO.Code varchar2(4000),
##  EEO.Desc varchar2(4000),
##  Ethnicity varchar2(4000),
##  Ethnicity.Code varchar2(4000),
##  Gender varchar2(4000),
##  Length.of.Service.w.City number(38,4),
##  Length.of.Service.in.Job number(38,4),
##  Job.Hrs.Pay number(38,4),
##  Age number(38,4),
##  Hourly.Rate number(38,4),
##  Annual.Salary number(38,4),
##  Emp.ID number(38,4),
##  Staffing.Level number(38,4)
##  );

Parish, Louisiana Salaries

#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/parish_salaries_ETL.R", echo = TRUE)
source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/parish_salaries_ETL.R", echo = TRUE)
## 
## > require(tidyr)
## 
## > require(dplyr)
## 
## > require(ggplot2)
## 
## > setwd("~/dv_finalproject_rabinbhattarai/01 Data")
## 
## > file_path <- "City-Parish_Employees.csv"
## 
## > df <- read.csv(file_path, stringsAsFactors = FALSE)
## 
## > names(df) <- gsub("\\.+", "_", names(df))
## 
## > str(df)
## 'data.frame':    23300 obs. of  24 variables:
##  $ EMPLOYEE_NUM        : int  526479 513040 499447 365831 476749 449784 487112 505145 526509 526517 ...
##  $ LAST_NAME           : chr  "HALL" "HARRISON" "THIBODEAUX" "HOUSTON" ...
##  $ FIRST_NAME          : chr  "LOUIS" "KEITH" "SAMANTHA" "ASHLEY" ...
##  $ MIDDLE_INITIAL      : chr  "" "D" "L" "L" ...
##  $ DEPARTMENT_NUM      : int  77 70 60 70 41 41 41 77 50 79 ...
##  $ DEPARTMENT_NAME     : chr  "DEPT OF MAINTENANCE" "DEPARTMENT OF PUBLIC WORKS" "HUMAN DEVELOPMENT AND SERVICES" "DEPARTMENT OF PUBLIC WORKS" ...
##  $ DIVISION_NUM        : int  7702002 7031102 6044706 7070303 4120000 4120000 4120000 7702002 5050001 7901000 ...
##  $ DIVISION_NAME       : chr  "MAINTENANCE-LANDSCAPE MAINT" "DPW-EAST LOT - STREETS & ROADS" "COMMUNITY SERVICES BLOCK GRANT-ADMIN" "SEWERAGE-WASTEWATER COLLECTION" ...
##  $ JOB_CODE            : int  160105 151375 110620 151750 100160 100160 100160 160105 110090 141115 ...
##  $ JOB_TITLE           : chr  "LABORER" "MAINTENANCE WORKER II" "ADMINISTRATIVE SPECIALIST I" "COMPLAINT INVESTIGATOR" ...
##  $ PAY_RANGE           : int  1040 1080 1110 1110 2260 2260 2260 1040 1070 1140 ...
##  $ PAY_STEP            : int  5 1 3 13 10 12 10 5 2 6 ...
##  $ SEX                 : chr  "M" "M" "F" "M" ...
##  $ RACE                : chr  "B" "B" "B" "B" ...
##  $ CURRENT_HIRE_DATE   : chr  "07/11/2016" "09/22/2014" "12/26/2012" "11/24/2004" ...
##  $ EMPLOYMENT_END_DATE : chr  "" "12/19/2014" "" "" ...
##  $ YEARS_SERVICE       : int  0 0 3 11 5 7 5 0 0 0 ...
##  $ SCHEDULED_HOURS     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LONGEVITY_PERCENTAGE: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ HOURLY_RATE         : num  9.3 9.79 12.26 16.48 31.35 ...
##  $ TOTAL_HOURLY_RATE   : num  9.3 9.79 12.26 16.48 31.35 ...
##  $ OVERTIME_HOURLY_RATE: num  14 14.7 18.4 24.7 47 ...
##  $ ANNUAL_SALARY       : chr  "$19351.02" "$20359.04" "$25503.14" "$34273.98" ...
##  $ EMPLOYMENT_STATUS   : int  0 5 0 0 0 0 0 0 0 0 ...
## 
## > measures <- c("YEARS_SERVICE", "TOTAL_HOURLY_RATE", 
## +     "ANNUAL_SALARY", "YEARS_SERVICE")
## 
## > for (n in names(df)) {
## +     df[n] <- data.frame(lapply(df[n], gsub, pattern = "[^ -~]", 
## +         replacement = ""))
## + }
## 
## > dimensions <- setdiff(names(df), measures)
## 
## > if (length(measures) > 1 || !is.na(dimensions)) {
## +     for (d in dimensions) {
## +         df[d] <- data.frame(lapply(df[d], gsub, pattern = "[\"']", ..." ... [TRUNCATED] 
## 
## > library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
## 
## > if (length(measures) > 1 || !is.na(measures)) {
## +     for (m in measures) {
## +         df[m] <- data.frame(lapply(df[m], gsub, pattern = "[^--.0-9]", .... [TRUNCATED] 
## 
## > write.csv(df, paste(gsub(".csv", "", file_path), ".reformatted.csv", 
## +     sep = ""), row.names = FALSE, na = "")
## 
## > tableName <- gsub(" +", "_", gsub("[^A-z, 0-9, ]", 
## +     "", gsub(".csv", "", file_path)))
## 
## > sql <- paste("CREATE TABLE", tableName, "(\n-- Change table_name to the table name you want.\n")
## 
## > if (length(measures) > 1 || !is.na(dimensions)) {
## +     for (d in dimensions) {
## +         sql <- paste(sql, paste(d, "varchar2(4000),\n"))
## +     }
## + .... [TRUNCATED] 
## 
## > if (length(measures) > 1 || !is.na(measures)) {
## +     for (m in measures) {
## +         if (m != tail(measures, n = 1)) 
## +             sql <- paste(sq .... [TRUNCATED] 
## 
## > sql <- paste(sql, ");")
## 
## > cat(sql)
## CREATE TABLE CityParish_Employees (
## -- Change table_name to the table name you want.
##  EMPLOYEE_NUM varchar2(4000),
##  LAST_NAME varchar2(4000),
##  FIRST_NAME varchar2(4000),
##  MIDDLE_INITIAL varchar2(4000),
##  DEPARTMENT_NUM varchar2(4000),
##  DEPARTMENT_NAME varchar2(4000),
##  DIVISION_NUM varchar2(4000),
##  DIVISION_NAME varchar2(4000),
##  JOB_CODE varchar2(4000),
##  JOB_TITLE varchar2(4000),
##  PAY_RANGE varchar2(4000),
##  PAY_STEP varchar2(4000),
##  SEX varchar2(4000),
##  RACE varchar2(4000),
##  CURRENT_HIRE_DATE varchar2(4000),
##  EMPLOYMENT_END_DATE varchar2(4000),
##  SCHEDULED_HOURS varchar2(4000),
##  LONGEVITY_PERCENTAGE varchar2(4000),
##  HOURLY_RATE varchar2(4000),
##  OVERTIME_HOURLY_RATE varchar2(4000),
##  EMPLOYMENT_STATUS varchar2(4000),
##  YEARS_SERVICE number(38,4)
##  TOTAL_HOURLY_RATE number(38,4),
##  ANNUAL_SALARY number(38,4),
##  YEARS_SERVICE number(38,4)
##  );

Austin, Texas Crime 2014:

#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/austincrime14_ETL.R", echo = TRUE)
source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/austincrime14_ETL.R", echo = TRUE)
## 
## > require(tidyr)
## 
## > require(dplyr)
## 
## > require(ggplot2)
## 
## > setwd("~/dv_finalproject_rabinbhattarai/01 Data")
## 
## > file_path <- "AustinCrime14.csv"
## 
## > atxCrime14 <- read.csv(file_path, stringsAsFactors = FALSE)
## 
## > names(atxCrime14)
##  [1] "GO.Primary.Key"                       
##  [2] "Council.District"                     
##  [3] "GO.Highest.Offense.Desc"              
##  [4] "Highest.NIBRS.UCR.Offense.Description"
##  [5] "GO.Report.Date"                       
##  [6] "GO.Location"                          
##  [7] "Clearance.Status"                     
##  [8] "Clearance.Date"                       
##  [9] "GO.District"                          
## [10] "GO.Location.Zip"                      
## [11] "GO.Census.Tract"                      
## [12] "GO.X.Coordinate"                      
## [13] "GO.Y.Coordinate"                      
## [14] "Location_1"                           
## 
## > str(atxCrime14)
## 'data.frame':    40641 obs. of  14 variables:
##  $ GO.Primary.Key                       : num  2.01e+10 2.01e+10 2.01e+10 2.01e+10 2.01e+10 ...
##  $ Council.District                     : int  1 1 3 1 3 1 9 4 3 9 ...
##  $ GO.Highest.Offense.Desc              : chr  "AGG ROBBERY/DEADLY WEAPON" "ROBBERY BY ASSAULT" "ROBBERY BY THREAT" "AGG ROBBERY/DEADLY WEAPON" ...
##  $ Highest.NIBRS.UCR.Offense.Description: chr  "Robbery" "Robbery" "Robbery" "Robbery" ...
##  $ GO.Report.Date                       : chr  "4/17/2014" "4/25/2014" "5/11/2014" "6/16/2014" ...
##  $ GO.Location                          : chr  "12151 N IH 35 SVRD NB" "3300 BLOCK ROCKHURST LN" "E 7TH ST / CHICON ST" "WHELESS LN / BERKMAN DR" ...
##  $ Clearance.Status                     : chr  "N" "N" "N" "C" ...
##  $ Clearance.Date                       : chr  "4/28/2014 0:00:00" "5/20/2014 0:00:00" "5/13/2014 0:00:00" "3/24/2015 0:00:00" ...
##  $ GO.District                          : chr  "E" "I" "C" "I" ...
##  $ GO.Location.Zip                      : int  78753 78723 78702 78723 78702 78722 78701 78753 78741 78701 ...
##  $ GO.Census.Tract                      : num  18.35 21.13 9.02 21.04 9.02 ...
##  $ GO.X.Coordinate                      : int  3135985 3137985 3120890 3130566 3117732 3122536 3114310 3127097 3127162 3116520 ...
##  $ GO.Y.Coordinate                      : int  10117220 10087946 10068910 10089446 10068195 10075649 10070753 10097584 10055519 10070158 ...
##  $ Location_1                           : chr  "12151 N IH 35 SVRD NB\nAustin, TX 78753\n" "3300 BLOCK ROCKHURST LN\nAustin, TX 78723\n" "E 7TH ST / CHICON ST\nAustin, TX 78702\n" "WHELESS LN / BERKMAN DR\nAustin, TX 78723\n" ...
## 
## > View(atxCrime14)
## 
## > df <- rename(atxCrime14, Crime_Type = GO.Highest.Offense.Desc, 
## +     Date_of_crime = GO.Report.Date, zip_code = GO.Location.Zip, 
## +     address = G .... [TRUNCATED] 
## 
## > str(df)
## 'data.frame':    40641 obs. of  14 variables:
##  $ GO.Primary.Key                       : num  2.01e+10 2.01e+10 2.01e+10 2.01e+10 2.01e+10 ...
##  $ Council.District                     : int  1 1 3 1 3 1 9 4 3 9 ...
##  $ Crime_Type                           : chr  "AGG ROBBERY/DEADLY WEAPON" "ROBBERY BY ASSAULT" "ROBBERY BY THREAT" "AGG ROBBERY/DEADLY WEAPON" ...
##  $ Highest.NIBRS.UCR.Offense.Description: chr  "Robbery" "Robbery" "Robbery" "Robbery" ...
##  $ Date_of_crime                        : chr  "4/17/2014" "4/25/2014" "5/11/2014" "6/16/2014" ...
##  $ address                              : chr  "12151 N IH 35 SVRD NB" "3300 BLOCK ROCKHURST LN" "E 7TH ST / CHICON ST" "WHELESS LN / BERKMAN DR" ...
##  $ Clearance.Status                     : chr  "N" "N" "N" "C" ...
##  $ Clearance.Date                       : chr  "4/28/2014 0:00:00" "5/20/2014 0:00:00" "5/13/2014 0:00:00" "3/24/2015 0:00:00" ...
##  $ GO.District                          : chr  "E" "I" "C" "I" ...
##  $ zip_code                             : int  78753 78723 78702 78723 78702 78722 78701 78753 78741 78701 ...
##  $ GO.Census.Tract                      : num  18.35 21.13 9.02 21.04 9.02 ...
##  $ GO.X.Coordinate                      : int  3135985 3137985 3120890 3130566 3117732 3122536 3114310 3127097 3127162 3116520 ...
##  $ GO.Y.Coordinate                      : int  10117220 10087946 10068910 10089446 10068195 10075649 10070753 10097584 10055519 10070158 ...
##  $ Location_1                           : chr  "12151 N IH 35 SVRD NB\nAustin, TX 78753\n" "3300 BLOCK ROCKHURST LN\nAustin, TX 78723\n" "E 7TH ST / CHICON ST\nAustin, TX 78702\n" "WHELESS LN / BERKMAN DR\nAustin, TX 78723\n" ...
## 
## > for (n in names(df)) {
## +     df[n] <- data.frame(lapply(df[n], gsub, pattern = "[^ -~]", 
## +         replacement = ""))
## + }
## 
## > str(df)
## 'data.frame':    40641 obs. of  14 variables:
##  $ GO.Primary.Key                       : Factor w/ 40641 levels "20141000078",..: 493 1046 2046 4271 6947 7178 8744 10165 10410 10574 ...
##  $ Council.District                     : Factor w/ 10 levels "1","10","2","3",..: 1 1 4 1 4 1 10 5 4 10 ...
##  $ Crime_Type                           : Factor w/ 44 levels "AGG ASLT ENHANC STRANGL/SUFFOC",..: 11 30 31 11 11 30 30 31 11 30 ...
##  $ Highest.NIBRS.UCR.Offense.Description: Factor w/ 14 levels "Aggravated Assault",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ Date_of_crime                        : Factor w/ 365 levels "1/1/2014","1/10/2014",..: 191 200 215 251 292 14 317 342 345 349 ...
##  $ address                              : Factor w/ 17327 levels "","1 OLMOS DR",..: 2378 8045 16781 17315 17299 4733 16764 14267 4734 13160 ...
##  $ Clearance.Status                     : Factor w/ 4 levels "","C","N","O": 3 3 3 2 3 2 3 2 3 3 ...
##  $ Clearance.Date                       : Factor w/ 498 levels "","1/1/2014 0:00:00",..: 299 336 324 240 75 196 457 104 78 69 ...
##  $ GO.District                          : Factor w/ 11 levels "A","AP","B","C",..: 6 10 4 10 8 4 8 6 9 8 ...
##  $ zip_code                             : Factor w/ 48 levels "78610","78613",..: 43 17 8 17 8 16 7 43 32 7 ...
##  $ GO.Census.Tract                      : Factor w/ 207 levels "1.01","1.02",..: 88 146 206 137 206 195 4 68 162 4 ...
##  $ GO.X.Coordinate                      : Factor w/ 14523 levels "3057251","3058024",..: 13311 13555 9061 12109 7939 9653 6487 11115 11131 7474 ...
##  $ GO.Y.Coordinate                      : Factor w/ 15553 levels "10010316","10010757",..: 13964 10081 6356 10328 6146 7945 6970 11408 3798 6738 ...
##  $ Location_1                           : Factor w/ 17369 levels "1 OLMOS DRAustin, TX 78744(30.21909, -97.6874)",..: 2378 8048 16823 17357 17341 4734 16806 14272 4735 13165 ...
## 
## > measures <- c("GO.X.Coordinate", "GO.Y.Coordinate")
## 
## > dimensions <- c("Date_of_crime", "address", "zip_code", 
## +     "Crime_Type")
## 
## > dimensions
## [1] "Date_of_crime" "address"       "zip_code"      "Crime_Type"   
## 
## > View(df)
## 
## > if (length(measures) > 1 || !is.na(dimensions)) {
## +     for (d in dimensions) {
## +         if (d == "Date_of_crime") 
## +             df[d] <- as.Date( .... [TRUNCATED] 
## 
## > View(df)
## 
## > if (length(measures) > 1 || !is.na(measures)) {
## +     for (m in measures) {
## +         df[m] <- data.frame(lapply(df[m], gsub, pattern = "[^--.0-9]", .... [TRUNCATED] 
## 
## > View(df)
## 
## > write.csv(df, paste(gsub(".csv", "", file_path), ".reformatted.csv", 
## +     sep = ""), row.names = FALSE, na = "")
## 
## > tableName <- gsub(" +", "_", gsub("[^A-z, 0-9, ]", 
## +     "", gsub(".csv", "", file_path)))
## 
## > sql <- paste("CREATE TABLE", tableName, "(\n-- Change table_name to the table name you want.\n")
## 
## > if (length(measures) > 1 || !is.na(dimensions)) {
## +     for (d in dimensions) {
## +         if (d == "Date_of_crime") 
## +             sql <- paste(sql, .... [TRUNCATED] 
## 
## > if (length(measures) > 1 || !is.na(measures)) {
## +     for (m in measures) {
## +         if (m != tail(measures, n = 1)) 
## +             sql <- paste(sq .... [TRUNCATED] 
## 
## > sql <- paste(sql, ");")
## 
## > cat(sql)
## CREATE TABLE AustinCrime14 (
## -- Change table_name to the table name you want.
##  Date_of_crime date,
##  address varchar2(4000),
##  zip_code varchar2(4000),
##  Crime_Type varchar2(4000),
##  GO.X.Coordinate number(38,4),
##  GO.Y.Coordinate number(38,4)
##  );

Austin, Texas Crime 2016:

#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/austincrime_ETL.R", echo = TRUE)

source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/austincrime_ETL.R", echo = TRUE)
## 
## > require(tidyr)
## 
## > require(dplyr)
## 
## > require(ggplot2)
## 
## > setwd("~/dv_finalproject_rabinbhattarai/01 Data")
## 
## > file_path <- "AustinCrime.csv"
## 
## > atxCrime <- read.csv(file_path, stringsAsFactors = FALSE)
## 
## > names(atxCrime)
##  [1] "Incident.Report.Number" "Crime.Type"            
##  [3] "Date"                   "Time"                  
##  [5] "LOCATION_TYPE"          "ADDRESS"               
##  [7] "LONGITUDE"              "LATITUDE"              
##  [9] "State"                  "Country"               
## [11] "Continent"             
## 
## > str(atxCrime)
## 'data.frame':    7015 obs. of  11 variables:
##  $ Incident.Report.Number: num  2.02e+08 2.02e+10 2.02e+10 2.02e+09 2.02e+08 ...
##  $ Crime.Type            : chr  "AGG ASSAULT" "ASSAULT INFORMATION" "THEFT" "ASSAULT INFORMATION" ...
##  $ Date                  : chr  "1/2/2016" "4/22/2016" "5/14/2016" "3/21/2016" ...
##  $ Time                  : int  557 1800 2000 2126 1900 2300 1528 1417 0 6 ...
##  $ LOCATION_TYPE         : logi  NA NA NA NA NA NA ...
##  $ ADDRESS               : chr  "W MARTIN LUTHER KING JR BLVD / NUECES ST" "MANOR RD / ROGGE LN" "WHELESS LN / LINDA LN" "PROCK LN / SPRINGDALE RD" ...
##  $ LONGITUDE             : num  -97.7 -97.7 -97.7 -97.7 -97.7 ...
##  $ LATITUDE              : num  30.3 30.3 30.3 30.3 30.3 ...
##  $ State                 : chr  "Texas" "Texas" "Texas" "Texas" ...
##  $ Country               : chr  "United States of America" "United States of America" "United States of America" "United States of America" ...
##  $ Continent             : chr  "North America" "North America" "North America" "North America" ...
## 
## > View(atxCrime)
## 
## > df <- rename(atxCrime, Crime_Type = Crime.Type, Incident_report_number = Incident.Report.Number)
## 
## > str(df)
## 'data.frame':    7015 obs. of  11 variables:
##  $ Incident_report_number: num  2.02e+08 2.02e+10 2.02e+10 2.02e+09 2.02e+08 ...
##  $ Crime_Type            : chr  "AGG ASSAULT" "ASSAULT INFORMATION" "THEFT" "ASSAULT INFORMATION" ...
##  $ Date                  : chr  "1/2/2016" "4/22/2016" "5/14/2016" "3/21/2016" ...
##  $ Time                  : int  557 1800 2000 2126 1900 2300 1528 1417 0 6 ...
##  $ LOCATION_TYPE         : logi  NA NA NA NA NA NA ...
##  $ ADDRESS               : chr  "W MARTIN LUTHER KING JR BLVD / NUECES ST" "MANOR RD / ROGGE LN" "WHELESS LN / LINDA LN" "PROCK LN / SPRINGDALE RD" ...
##  $ LONGITUDE             : num  -97.7 -97.7 -97.7 -97.7 -97.7 ...
##  $ LATITUDE              : num  30.3 30.3 30.3 30.3 30.3 ...
##  $ State                 : chr  "Texas" "Texas" "Texas" "Texas" ...
##  $ Country               : chr  "United States of America" "United States of America" "United States of America" "United States of America" ...
##  $ Continent             : chr  "North America" "North America" "North America" "North America" ...
## 
## > for (n in names(df)) {
## +     df[n] <- data.frame(lapply(df[n], gsub, pattern = "[^ -~]", 
## +         replacement = ""))
## + }
## 
## > View(df)
## 
## > str(df)
## 'data.frame':    7015 obs. of  11 variables:
##  $ Incident_report_number: Factor w/ 7015 levels "20161000059",..: 2072 294 4760 6640 6028 6457 754 442 6145 6962 ...
##  $ Crime_Type            : Factor w/ 212 levels "ABAND VEH  (STORAGE FACILITY)",..: 9 25 176 25 27 179 132 132 132 176 ...
##  $ Date                  : Factor w/ 257 levels "1/1/2016","1/10/2016",..: 12 123 145 89 26 37 149 140 98 134 ...
##  $ Time                  : Factor w/ 1318 levels "0","1","10","100",..: 1126 525 646 743 585 857 366 284 1 1131 ...
##  $ LOCATION_TYPE         : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
##  $ ADDRESS               : Factor w/ 3116 levels "0 N MOPAC NB AT MOPAC RAILRO TRN",..: 2874 1490 3074 2003 342 733 785 418 571 561 ...
##  $ LONGITUDE             : Factor w/ 2336 levels "-0.0001335584",..: 1447 228 330 367 481 1013 1171 1128 900 661 ...
##  $ LATITUDE              : Factor w/ 2336 levels "0.0001204538",..: 1275 1529 1637 1061 1762 865 1003 682 1127 1162 ...
##  $ State                 : Factor w/ 1 level "Texas": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Country               : Factor w/ 1 level "United States of America": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Continent             : Factor w/ 1 level "North America": 1 1 1 1 1 1 1 1 1 1 ...
## 
## > measures <- c("LATITUDE", "LONGITUDE")
## 
## > dimensions <- setdiff(names(df), measures)
## 
## > dimensions
## [1] "Incident_report_number" "Crime_Type"            
## [3] "Date"                   "Time"                  
## [5] "LOCATION_TYPE"          "ADDRESS"               
## [7] "State"                  "Country"               
## [9] "Continent"             
## 
## > measures
## [1] "LATITUDE"  "LONGITUDE"
## 
## > if (length(measures) > 1 || !is.na(dimensions)) {
## +     for (d in dimensions) {
## +         if (d == "Date") 
## +             df[d] <- as.Date(atxCrime$ .... [TRUNCATED] 
## 
## > View(df)
## 
## > if (length(measures) > 1 || !is.na(measures)) {
## +     for (m in measures) {
## +         df[m] <- data.frame(lapply(df[m], gsub, pattern = "[^--.0-9]", .... [TRUNCATED] 
## 
## > View(df)
## 
## > write.csv(df, paste(gsub(".csv", "", file_path), ".reformatted.csv", 
## +     sep = ""), row.names = FALSE, na = "")
## 
## > tableName <- gsub(" +", "_", gsub("[^A-z, 0-9, ]", 
## +     "", gsub(".csv", "", file_path)))
## 
## > sql <- paste("CREATE TABLE", tableName, "(\n-- Change table_name to the table name you want.\n")
## 
## > if (length(measures) > 1 || !is.na(dimensions)) {
## +     for (d in dimensions) {
## +         if (d == "Date") 
## +             sql <- paste(sql, paste(d, .... [TRUNCATED] 
## 
## > if (length(measures) > 1 || !is.na(measures)) {
## +     for (m in measures) {
## +         if (m != tail(measures, n = 1)) 
## +             sql <- paste(sq .... [TRUNCATED] 
## 
## > sql <- paste(sql, ");")
## 
## > cat(sql)
## CREATE TABLE AustinCrime (
## -- Change table_name to the table name you want.
##  Incident_report_number varchar2(4000),
##  Crime_Type varchar2(4000),
##  Date date,
##  Time varchar2(4000),
##  LOCATION_TYPE varchar2(4000),
##  ADDRESS varchar2(4000),
##  State varchar2(4000),
##  Country varchar2(4000),
##  Continent varchar2(4000),
##  LATITUDE number(38,4),
##  LONGITUDE number(38,4)
##  );

Summary and Subset

Next, we created the data frame we needed by extracting the information from SQL. Below is the code importing the dataset from Oracle to make the dataframe along with a summary and a subset of the datasets:

Parish Salaries summary and subset of the fire department who have a salary greater than $10,000:

#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/Parish_summary_subset.R", echo = TRUE)


source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/Parish_summary_subset.R", echo = TRUE)
## 
## > require("jsonlite")
## Loading required package: jsonlite
## 
## > require("RCurl")
## Loading required package: RCurl
## Loading required package: bitops
## 
## Attaching package: 'RCurl'
## The following object is masked from 'package:tidyr':
## 
##     complete
## 
## > PARISH_SALARIES_df <- data.frame(fromJSON(getURL(URLencode("oraclerest.cs.utexas.edu:5001/rest/native/?query=\"select * from PARISH_SALARIES\""), 
## + .... [TRUNCATED] 
## 
## > summary(PARISH_SALARIES_df)
##   EMPLOYEE_NUM       LAST_NAME       FIRST_NAME    MIDDLE_INITIAL 
##  Min.   :    78   WILLIAMS:  464   MICHAEL:  335   null   : 2317  
##  1st Qu.:324963   JOHNSON :  400   JAMES  :  326   L      : 2316  
##  Median :397632   BROWN   :  269   JOHN   :  305   M      : 2223  
##  Mean   :372329   SMITH   :  264   ROBERT :  256   D      : 2001  
##  3rd Qu.:464416   JACKSON :  240   CHARLES:  216   A      : 1922  
##  Max.   :528277   JONES   :  240   JOSEPH :  189   J      : 1840  
##                   (Other) :21423   (Other):21673   (Other):10681  
##  DEPARTMENT_NUM                        DEPARTMENT_NAME  DIVISION_NUM    
##  Min.   : 1.00   DEPARTMENT OF PUBLIC WORKS    :4425   Min.   : 120010  
##  1st Qu.:40.00   HUMAN DEVELOPMENT AND SERVICES:4186   1st Qu.:4020116  
##  Median :51.00   POLICE DEPARTMENT             :2730   Median :5130001  
##  Mean   :47.22   LIBRARY BOARD OF CONTROL      :2487   Mean   :4749240  
##  3rd Qu.:60.00   FIRE DEPARTMENT               :1387   3rd Qu.:6055086  
##  Max.   :89.00   EMERGENCY MEDICAL SERVICES    :1328   Max.   :8999999  
##                  (Other)                       :6757                    
##                                      DIVISION_NAME      JOB_CODE     
##  LIBRARY BOARD OF CONTROL                   : 2487   Min.   :    19  
##  POLICE DEPARTMENT-UNIFORM PATROL BUREAU    : 1257   1st Qu.:105381  
##  FIRE DEPT-FIRE SUPPRESSION  and  PREVENTION: 1161   Median :151380  
##  DPW-STATE HIGHWAY MAINTENANCE              : 1036   Mean   :272848  
##  HEADSTART                                  :  960   3rd Qu.:400074  
##  WIA YOUTH PROGRAM                          :  888   Max.   :999999  
##  (Other)                                    :15511                   
##                            JOB_TITLE       PAY_RANGE       PAY_STEP     
##  LABORER                        : 1823   Min.   : 100   Min.   : 1.000  
##  COLLEGE STUDENT INTERN/CONTRACT: 1788   1st Qu.:1050   1st Qu.: 1.000  
##  WIA SUMMER EMPLOYMENT WORKER   : 1064   Median :2116   Median : 2.000  
##  DISPLACED DISASTER WORKER      :  905   Mean   :4204   Mean   : 4.874  
##  POLICE OFFICER                 :  785   3rd Qu.:9999   3rd Qu.: 8.000  
##  LIBRARY PAGE (20 HOURS)        :  601   Max.   :9999   Max.   :53.000  
##  (Other)                        :16334                                  
##  SEX            RACE        CURRENT_HIRE_DATE EMPLOYMENT_END_DATE
##  F:10908   B      :13628   06/07/2010:  526   null      : 4577   
##  M:12388   W      : 9334   06/01/2009:  510   08/13/2010:  554   
##  U:    4   A      :  114   06/01/2012:  112   07/31/2009:  477   
##            S      :   81   06/09/2016:  108   01/27/2006:  219   
##            O      :   64   06/04/2015:   97   07/08/2016:  131   
##            U      :   56   07/15/2013:   82   07/02/2015:  118   
##            (Other):   23   (Other)   :21865   (Other)   :17224   
##  YEARS_SERVICE    SCHEDULED_HOURS    LONGEVITY_PERCENTAGE  HOURLY_RATE    
##  Min.   : 0.000   Min.   :    0.00   Min.   :0.00000      Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.:    0.00   1st Qu.:0.00000      1st Qu.: 7.669  
##  Median : 1.000   Median :    0.00   Median :0.00000      Median : 9.750  
##  Mean   : 5.149   Mean   :   17.88   Mean   :0.02129      Mean   :12.458  
##  3rd Qu.: 6.000   3rd Qu.:    0.00   3rd Qu.:0.00000      3rd Qu.:15.684  
##  Max.   :48.000   Max.   : 9999.99   Max.   :0.20000      Max.   :71.157  
##                                                                           
##  TOTAL_HOURLY_RATE OVERTIME_HOURLY_RATE ANNUAL_SALARY    EMPLOYMENT_STATUS
##  Min.   :-3.000    Min.   :  0.00       Min.   :     0   Min.   :0.000    
##  1st Qu.: 7.715    1st Qu.: 12.00       1st Qu.:  7248   1st Qu.:5.000    
##  Median : 9.750    Median : 15.00       Median : 16951   Median :5.000    
##  Mean   :12.840    Mean   : 20.25       Mean   : 21600   Mean   :4.018    
##  3rd Qu.:16.502    3rd Qu.: 25.44       3rd Qu.: 31786   3rd Qu.:5.000    
##  Max.   :76.849    Max.   :115.27       Max.   :148006   Max.   :5.000    
##                                                                           
## 
## > head(subset(PARISH_SALARIES_df, DEPARTMENT_NAME == 
## +     "FIRE DEPARTMENT" & ANNUAL_SALARY > 10000))
##    EMPLOYEE_NUM  LAST_NAME  FIRST_NAME MIDDLE_INITIAL DEPARTMENT_NUM
## 1        502936   EVANS JR     CHARLES              E             51
## 2        442259       MUNN      JORDAN              A             51
## 8        302503       ELAM    KIMBERLY              M             51
## 11       315680 LEBRANE JR     CHARLES              E             51
## 18       503932     REMSON     GARRETT              M             51
## 32       391310    MOUILLE CHRISTOPHER              W             51
##    DEPARTMENT_NAME DIVISION_NUM
## 1  FIRE DEPARTMENT      5120001
## 2  FIRE DEPARTMENT      5120001
## 8  FIRE DEPARTMENT      5110001
## 11 FIRE DEPARTMENT      5120001
## 18 FIRE DEPARTMENT      5120001
## 32 FIRE DEPARTMENT      5120001
##                                  DIVISION_NAME JOB_CODE
## 1  FIRE DEPT-FIRE SUPPRESSION  and  PREVENTION   170105
## 2  FIRE DEPT-FIRE SUPPRESSION  and  PREVENTION   170105
## 8               FIRE DEPARTMENT-ADMINISTRATION   400074
## 11 FIRE DEPT-FIRE SUPPRESSION  and  PREVENTION   170115
## 18 FIRE DEPT-FIRE SUPPRESSION  and  PREVENTION   170105
## 32 FIRE DEPT-FIRE SUPPRESSION  and  PREVENTION   170110
##                          JOB_TITLE PAY_RANGE PAY_STEP SEX RACE
## 1                     FIRE FIGHTER      7201        1   M    B
## 2                     FIRE FIGHTER      7201       12   M    W
## 8  COLLEGE STUDENT INTERN/CONTRACT      9999        1   F    W
## 11                    FIRE CAPTAIN      7220       19   M    B
## 18                    FIRE FIGHTER      7201        3   M    W
## 32                 FIRE EQUIP OPER      7210       14   M    W
##    CURRENT_HIRE_DATE EMPLOYMENT_END_DATE YEARS_SERVICE SCHEDULED_HOURS
## 1         06/03/2013          06/17/2013             0             112
## 2         07/10/2006                null             0             112
## 8         05/27/2003          12/07/2006             3              58
## 11        07/16/1996                null             0             112
## 18        07/15/2013                null             3             112
## 32        07/12/2004                null            12             112
##    LONGEVITY_PERCENTAGE HOURLY_RATE TOTAL_HOURLY_RATE OVERTIME_HOURLY_RATE
## 1                     0     10.0364           10.0364              15.0546
## 2                     0     15.0337           15.0337              25.6412
## 8                     0      8.2500            8.2500              12.3750
## 11                    0     23.1117           23.1117              37.7582
## 18                    0     11.5227           11.5227              20.3747
## 32                    0     17.5440           17.5440              29.4068
##    ANNUAL_SALARY EMPLOYMENT_STATUS
## 1       29226.08                 5
## 2       43778.02                 0
## 8       12441.00                 5
## 11      67301.26                 0
## 18      33554.04                 0
## 32      51088.18                 0

Austin Crime 2014 summary and subset of Zip Code from 78701:

#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/austincrime14_subset.R", echo = TRUE)

source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/austincrime14_subset.R", echo = TRUE)
## 
## > require("jsonlite")
## 
## > require("RCurl")
## 
## > austinCrime14 <- data.frame(fromJSON(getURL(URLencode("oraclerest.cs.utexas.edu:5001/rest/native/?query=\"select * from austincrime14\""), 
## +     ht .... [TRUNCATED] 
## 
## > summary(austinCrime14)
##              DATE_OF_CRIME                             ADDRESS     
##  2014-10-13 00:00:00:  169   1030 NORWOOD PARK BLVD        :  775  
##  2014-05-19 00:00:00:  165   null                          :  561  
##  2014-03-31 00:00:00:  156   2901 S CAPITAL OF TEXAS HWY NB:  399  
##  2014-01-27 00:00:00:  155   2508 E RIVERSIDE DR           :  311  
##  2014-07-14 00:00:00:  155   9300 S IH 35 SVRD SB          :  258  
##  2014-02-10 00:00:00:  153   11200 LAKELINE MALL DR        :  252  
##  (Other)            :39688   (Other)                       :38085  
##     ZIP_CODE                     CRIME_TYPE     X_COORDINATE  
##  78753  : 3935   THEFT                :11040   3129997:  775  
##  78741  : 3394   BURGLARY OF VEHICLE  : 9993   null   :  688  
##  78758  : 2826   THEFT BY SHOPLIFTING : 4284   3094135:  404  
##  78704  : 2715   BURGLARY OF RESIDENCE: 3980   3120959:  312  
##  78745  : 2577   AUTO THEFT           : 2188   3100423:  261  
##  78702  : 2196   THEFT OF BICYCLE     : 1493   3092524:  252  
##  (Other):22998   (Other)              : 7663   (Other):37949  
##    Y_COORDINATE  
##  10096983:  775  
##  null    :  688  
##  10066586:  402  
##  10059177:  312  
##  10033116:  260  
##  10143916:  252  
##  (Other) :37952  
## 
## > head(subset(austinCrime14, ZIP_CODE == "78701"))
##           DATE_OF_CRIME                 ADDRESS ZIP_CODE
## 38  2014-12-01 00:00:00     500 N IH 35 SVRD SB    78701
## 202 2014-08-20 00:00:00 E 6TH ST / CONGRESS AVE    78701
## 205 2014-09-21 00:00:00      700 BLOCK E 6TH ST    78701
## 209 2014-02-08 00:00:00     700 BLOCK NUECES ST    78701
## 215 2014-01-10 00:00:00      700 BLOCK E 4TH ST    78701
## 218 2014-05-16 00:00:00  700 BLOCK GUADALUPE ST    78701
##                    CRIME_TYPE X_COORDINATE Y_COORDINATE
## 38      BURGLARY OF RESIDENCE      3116639     10069697
## 202        ROBBERY BY ASSAULT      3114310     10070753
## 205        ROBBERY BY ASSAULT      3116520     10070158
## 209 AGG ROBBERY/DEADLY WEAPON      3112654     10071801
## 215        ROBBERY BY ASSAULT      3116340     10069527
## 218        ROBBERY BY ASSAULT      3113235     10071572

Austin Crime 2016 summary and subset for DWI crimes:

#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/austincrime_subset.R, echo = TRUE)

source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/austincrime_subset.R", echo = TRUE)
## 
## > require("jsonlite")
## 
## > require("RCurl")
## 
## > austinCrime <- data.frame(fromJSON(getURL(URLencode("oraclerest.cs.utexas.edu:5001/rest/native/?query=\"select * from austincrime\""), 
## +     httphe .... [TRUNCATED] 
## 
## > summary(austinCrime)
##  INCIDENT_REPORT_NUMBER                         CRIME_TYPE  
##  Min.   :2.016e+08      CRASH/LEAVING THE SCENE      :1910  
##  1st Qu.:2.017e+09      DWI                          : 394  
##  Median :2.016e+10      LOST PROP                    : 344  
##  Mean   :1.560e+10      DRIVING WHILE LICENSE INVALID: 227  
##  3rd Qu.:2.017e+10      THEFT                        : 224  
##  Max.   :2.016e+12      POSS MARIJUANA               : 217  
##                         (Other)                      :3699  
##              DATE_OF_CRIME       TIME      LOCATION_TYPE
##  2016-03-17 00:00:00:  48   Min.   :   0   null:7015    
##  2016-02-26 00:00:00:  47   1st Qu.: 736                
##  2016-03-18 00:00:00:  47   Median :1454                
##  2016-01-21 00:00:00:  46   Mean   :1315                
##  2016-05-07 00:00:00:  46   3rd Qu.:1917                
##  2016-04-02 00:00:00:  45   Max.   :2359                
##  (Other)            :6736                               
##                                   ADDRESS       STATE     
##  E 6TH ST / TRINITY ST                :  75   Texas:7015  
##  UNKNOWN                              :  73               
##  E 6TH ST / NECHES ST                 :  47               
##  E 6TH ST / SAN JACINTO BLVD          :  39               
##  E RIVERSIDE DR / S PLEASANT VALLEY RD:  34               
##  E 7TH ST / TRINITY ST                :  33               
##  (Other)                              :6714               
##                      COUNTRY             CONTINENT       LATITUDE      
##  United States of America:7015   North America:7015   Min.   : 0.0001  
##                                                       1st Qu.:30.2430  
##                                                       Median :30.2690  
##                                                       Mean   :30.2211  
##                                                       3rd Qu.:30.3261  
##                                                       Max.   :30.5149  
##                                                                        
##    LONGITUDE       
##  Min.   :-97.9533  
##  1st Qu.:-97.7489  
##  Median :-97.7350  
##  Mean   :-97.5232  
##  3rd Qu.:-97.7037  
##  Max.   : -0.0001  
##                    
## 
## > head(subset(austinCrime, CRIME_TYPE == "DWI"))
##     INCIDENT_REPORT_NUMBER CRIME_TYPE       DATE_OF_CRIME TIME
## 30              2016270145        DWI 2016-01-27 00:00:00  356
## 157            20161959936        DWI 2016-07-14 00:00:00  422
## 296             2016950771        DWI 2016-04-04 00:00:00 1250
## 557            20161851392        DWI 2016-07-03 00:00:00 1842
## 558            20161859584        DWI 2016-07-04 00:00:00  241
## 940            20161331200        DWI 2016-05-12 00:00:00  328
##     LOCATION_TYPE                                        ADDRESS STATE
## 30           null                       S 1ST ST / CUMBERLAND RD Texas
## 157          null            S 1ST ST / W BEN WHITE BLVD SVRD WB Texas
## 296          null                       DENSON DR / N LAMAR BLVD Texas
## 557          null               CENTER RIDGE DR / MC CALLEN PASS Texas
## 558          null                       W 7TH ST / RIO GRANDE ST Texas
## 940          null N IH 35 SVRD NB / E MARTIN LUTHER KING JR BLVD Texas
##                      COUNTRY     CONTINENT LATITUDE LONGITUDE
## 30  United States of America North America  30.2382  -97.7621
## 157 United States of America North America  30.2269  -97.7692
## 296 United States of America North America  30.3304  -97.7235
## 557 United States of America North America  30.4143  -97.6634
## 558 United States of America North America  30.2709  -97.7492
## 940 United States of America North America  30.2785  -97.7291

Shiny App Images:

Below is our first shiny app visualization. Here we used Salary data from employees in the city of Austin and divided the data into a set (Female vs Male). We wanted to visualize the departments in Austin where women earned more annually than men (and vice-versa) and we were able to accomplish this. (Code used to produce app can be found in 03 Shiny folder) === Here is our visualization of departments were Men earned more annually than Women.

=== Here, we focus on our other dataset regarding crime around Austin. This part of the shiny app lets the user choose a crime, which then produces a visual heat map of where the crime occurred in Austin. There’s also a data table at the bottom for reference to the crimes shown and not shown on the graph.

=== Here is another visulaztion showing where crime occurred in Austin. Just like the previous image, this part of the app allows the user to choose a crime and produce a visualization of the crime locations. In this case, we utilize a bin structure to present the crime locations rather than the previous heat map.


=== This next part of shiny app utilizes the joining of two datasets (Austin crime 2014 and Austin crime 2016). Creating the join was accomplished by combining both datasets on a common address. This allowed us to see what locations in Austin had different crimes occuring at the same place. This joining mechanism also allowed us to see which zip codes had the most crime, which are shown in the bar graph below. And finally, the user can now visually see where these crimes occurred by selecting a zip code.

===

This marks the end of our shiny app, and here we go back to our Austin Salary data. The produced graphs below provide the user with both interesting data in regard to ethnicities as well as producing basic visualizations for users wanting perform statistical analysis of the data (more visulazitions will be needed.)

URL to Article: http://www.austintexas.gov/edims/document.cfm?id=207153

URL to Article: http://www.theadvocate.com/baton_rouge/news/business/article_f02a27c3-116f-5048-ad1b-9eb20b2abf01.html

The articles state that in Austin women make 83 cents on the dollar compared to men. In Baton Rouge women make 65 cents on the dollar compared to men. However from our data we found that Austin City employees that are women only make 64 cents on the dollar and in Baton Rouge women make 66 cents on the dollar. This shows that Baton Rouge is consistent with its pay for women across the state. However, the City of Austin pays far less to women compared to men than what the city averages.

===

This graph produced in Tableau shows amount of crime per zip code in Austin by shading each area either light (low crime) or dark (high crime)

===

This tableau dashboard provides a visualization comparing crime in 2014 vs 2016.

Shiny Application

Here is a link to our Shiny Application: https://jnjock32.shinyapps.io/final/